Extraction of Polish Named-Entities
نویسنده
چکیده
,QWURGXFWLRQ 1DPHG HQWLWLHV 1( FRQVWLWXWH VLJQLILFDQW SDUW RI QDWXUDO ODQJXDJH WH[WV DQG DUH ZLGHO\ H[SORLWHG LQ YDULRXV 1/3 DSSOLFDWLRQV $OWKRXJK FRQVLGHUDEOH ZRUN RQ QDPHG HQWLW\ UHFRJQLWLRQ 1(5 IRU IHZ PDMRU ODQJXDJHV H[LVWV UHVHDUFK RQ WKLV WRSLF LQ WKH FRQWH[W RI 6ODYRQLF ODQJXDJHV KDV EHHQ DOPRVW QHJOHFWHG 6RPH 1(5 V\VWHPV IRU %XOJDULDQ DQG 5XVVLDQ FRQVWUXFWHG E\ DGDSWLQJ WKH IDPRXV LQIRUPDWLRQ H[WUDFWLRQ SODWIRUP *$7( &XQQLQJKDP HW DO ZHUH SUHVHQWHG DW D UHFHQW ,(6/ ZRUNVKRS &XQQLQJKDP HW DO ,Q WKLV SDSHU ZH SUHVHQW VRPH DWWHPSWV WRZDUGV FRQVWUXFWLQJ D 1(5 V\VWHP IRU 3ROLVK EXLOW RQ WRS RI 63UR87 %HFNHU HW DO 'UR G \ VNL HW DO D QRYHO JHQHUDO SXUSRVH PXOWL OLQJXDO 1/3 SODWIRUP DQG E\ GHSOR\LQJ VWDQGDUG PDFKLQH OHDUQLQJ WHFKQLTXHV 3ROLVK LV D :HVW 6ODYRQLF ODQJXDJH DQG DQDORJRXVO\ WR RWKHU ODQJXDJHV LQ WKH JURXS LW H[KLELWV D KLJKO\ LQIOHFWLRQDO FKDUDFWHU H J QRXQV DQG DGMHFWLYHV GHFOLQH LQ VHYHQ FDVHV DQG KDV D UHODWLYHO\ IUHH ZRUG RUGHU ZLG]L VNL DQG 6DORQL 'XH WR WKHVH VSHFLILFV DQG JHQHUDO ODFN RI OLQJXLVWLF UHVRXUFHV IRU 3ROLVK FRQVWUXFWLRQ RI D 1(5 V\VWHP IRU 3ROLVK LV DQ LQWULJXLQJ WDVN 63UR87 6\VWHP RYHUYLHZ $QDORJRXVO\ WR WKH ZLGHO\ NQRZQ *$7( V\VWHP 63UR87 LV HTXLSSHG ZLWK D VHW RI UHXVDEOH 8QLFRGH FDSDEOH RQOLQH SURFHVVLQJ FRPSRQHQWV IRU EDVLF OLQJXLVWLF RSHUDWLRQV LQFOXGLQJ WRNHQL]DWLRQ VHQWHQFH VSOLWWLQJ PRUSKRORJLFDO DQDO\VLV JD]HWWHHU ORRNXS DQG UHIHUHQFH PDWFKLQJ 6LQFH W\SHG IHDWXUH VWUXFWXUHV 7)6 DUH XVHG DV D XQLIRUP GDWD VWUXFWXUH IRU UHSUHVHQWLQJ WKH LQSXW DQG RXWSXW E\ HDFK RI WKHVH SURFHVVLQJ UHVRXUFHV WKH\ FDQ EH IOH[LEO\ FRPELQHG LQWR D SLSHOLQH WKDW SURGXFHV VHYHUDO VWUHDPV RI OLQJXLVWLFDOO\ DQQRWDWHG VWUXFWXUHV ZKLFK VHUYH DV DQ LQSXW IRU WKH VKDOORZ JUDPPDU LQWHUSUHWHU DSSOLHG DW WKH QH[W VWDJH 6ODYRQLF ODQJXDJHV FRQVWLWXWH D ODUJH JURXS RI WKH ,QGRHXURSHDQ ODQJXDJH IDPLO\ DQG DUH IXUWKHU VSOLW LQWR :HVW (DVW DQG 6RXWK 6ODYRQLF VXEJURXSV 63UR87 – 6KDOORZ 3URFHVVLQJ ZLWK 7\SHG )HDWXUH 6WUXFWXUHV DQG 8QLILFDWLRQ 7KH JUDPPDU IRUPDOLVP LQ 63UR87 LV D EOHQG RI YHU\ HIILFLHQW ILQLWH VWDWH WHFKQLTXHV DQG XQLILFDWLRQ EDVHG IRUPDOLVPV ZKLFK DUH NQRZQ WR JXDUDQWHH WUDQVSDUHQF\ DQG H[SUHVVLYHQHVV 7R EH PRUH SUHFLVH D JUDPPDU LQ 63UR87 FRQVLVWV RI SDWWHUQ DFWLRQ UXOHV ZKHUH WKH /+6 RI D UXOH LV D UHJXODU H[SUHVVLRQ RYHU 7)6V ZLWK IXQFWLRQDO RSHUDWRUV DQG FRUHIHUHQFHV UHSUHVHQWLQJ WKH UHFRJQLWLRQ SDWWHUQ DQG WKH 5+6 RI D UXOH LV D 7)6 VSHFLILFDWLRQ RI WKH RXWSXW VWUXFWXUH &RUHIHUHQFHV H[SUHVV VWUXFWXUDO LGHQWLW\ FUHDWH G\QDPLF YDOXH DVVLJQPHQWV DQG VHUYH DV PHDQV RI LQIRUPDWLRQ WUDQVSRUW LQWR WKH RXWSXW GHVFULSWLRQV )XQFWLRQDO RSHUDWRUV SURYLGH D JDWHZD\ WR WKH RXWVLGH ZRUOG DQG WKH\ DUH SULPDULO\ XWLOL]HG IRU IRUPLQJ WKH RXWSXW RI D UXOH H J FRQFDWHQDWLRQ RI VWULQJV DQG IRU LQWURGXFLQJ FRPSOH[ FRQVWUDLQWV LQ WKH UXOHV WKH\ FDQ DFW DV SUHGLFDWHV WKDW SURGXFH %RROHDQ YDOXHV )XUWKHUPRUH JUDPPDU UXOHV FDQ EH UHFXUVLYHO\ HPEHGGHG ZKLFK LQ IDFW SURYLGHV JUDPPDULDQV ZLWK D FRQWH[W IUHH IRUPDOLVP 7KH IROORZLQJ UXOH IRU WKH UHFRJQLWLRQ RI SUHSRVLWLRQDO SKUDVHV JLYHV DQ LGHD RI WKH V\QWD[ RI WKH JUDPPDU IRUPDOLVP
منابع مشابه
Towards the Annotation of Named Entities in the National Corpus of Polish
We present the named entity annotation task within the on-going project of the National Corpus of Polish. To the best of our knowledge, this is the first attempt at a large-scale corpus annotation of Polish named entities. We describe the scope and the TEI-inspired hierarchy of named entities admitted for this task, as well as the TEI-conformant multi-level stand-off annotation format. We also ...
متن کاملLexicons and Grammars for Named Entity Annotation in the National Corpus of Polish
We present initial results in the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements de ned for this corpus, and we discuss how existing lexical resources and grammars for Polish named entities have been adapted to meet those requirements. We show rst results of the corpus annotation using the information extra...
متن کاملEvaluation of Coreference Resolution Tools for Polish from the Information Extraction Perspective
In this paper we discuss the performance of existing tools for coreference resolution for Polish from the perspective of information extraction tasks. We take into consideration the source of mentions, i.e., gold standard vs mentions recognized automatically. We evaluate three existing tools, i.e., IKAR, Ruler and Bartek on the KPWr corpus. We show that the widely used metrics for coreference e...
متن کاملDependency-based Extraction of Entity-relationship Triples from Polish Open-domain Texts
We present a prototype system for extracting arbitrary relations between named entities from open-domain texts in Polish based on DEBORA – a dependency-based approach to the problem. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary resu...
متن کاملDEBORA: Dependency-Based Method for Extracting Entity-Relationship Triples from Open-Domain Texts in Polish
We present DEBORA – a dependency-based approach to the problem of extraction of arbitrary relations between named entities from open-domain texts in Polish. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary results show that the method i...
متن کاملResources for Information Extraction from Polish texts
The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.
متن کامل